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Abstract 

The binary Euclidean algorithm is a variant of the classical Euclidean algorithm. It 
avoids multiplications and divisions, except by powers of two, so is potentially faster than 
■ the classical algorithm on a binary machine. 

CN ' 

We describe the binary algorithm and consider its average case behaviour. In particular, 
, we correct some errors in the literature, discuss some recent results of Vallee, and describe a 



(N 



X 



numerical computation which supports a conjecture of Vallee. 



1 Introduction 



\ In §2 we define the binary Euclidean algorithm and mention some of its properties, history and 

Q ' generalisations. Then, in §3 we outline the heuristic model which was first presented in 1976 [4]. 

■ Some of the results of that paper are mentioned (and simplified) in §4. 

o ■ 

Average case analysis of the binary Euclidean algorithm lay dormant from 1976 until Brigitte 
Vallee's recent analysis [29, 30]. In §§5-6 we discuss Vallee's results and conjectures. In §8 we 
, give some numerical evidence for one of her conjectures. Some connections between Vallee's 

' results and our earlier results are given in §7. 

!>; . Finally, in §9 we take the opportunity to point out an error in the 1976 paper [4]. Although 

the error is theoretically significant and (when pointed out) rather obvious, it appears that no 
' one noticed it for about twenty years. The manner of its discovery is discussed in §9. Some open 

ff^ ' problems are mentioned in §10. 



1.1 Notation 

lg(x) denotes log2(x). N,n, a, k,u,v are positive integers. 

Val2(n) denotes the dyadic valuation of the positive integer u, i.e. the greatest integer j such 
that 2^ I u. This is just the number of trailing zero bits in the binary representation of u. 

f,g,F,F,G are functions of a real or complex variable, and usually f{x) = F'(x), g{x) = 
G'{x) etc. Often /, g are probability densities and F, G are the corresponding probability distri- 
butions. 

Warning: Brent [4], Knuth [20], and Vallee [27, 29, 30] use incompatible notation. Knuth uses 
G{x) for our F{x), and S{x) for our G{x). Vallee sometimes interchanges our / and g. 
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2 The Binary Euclidean Algorithm 



The idea of the binary EucHdean algorithm is to avoid the "division" operation r m mod n 
of the classical algorithm, but retain O (log TV) worst (and average) case. 

We assume that the algorithm is implemented on a binary computer so division by a power 
of two is easy. In particular, we assume that the "shift right until odd" operation 

or equivalently 

while even(ti) do u 

can be performed in constant time, although time 0(Val2(u)) would be sufficient. 

2.1 Definitions of the Binciry Euclidean Algoritiim 

There are several almost equivalent ways to define the algorithm. It is easy to take account of 
the largest power of two dividing the inputs, using the relation 

GCB{u,v) = 2"ii^(Vai2(«),Vai2M) (u/2^^'2H^^/2^^'2M^ ^ 

so for simplicity we assume that u and v are odd positive integers. Following is a simplified 
version of the algorithm given in Knuth [20, §4.5.2]. 

Algorithm B 

Bl. t |ii — 

if t = terminate with result u 

B2. t^t/2Va'2(t) 

B3. if u>v then u t else v t; 
go to Bl. 

2.2 History 

The binary Euclidean algorithm is usually attributed to Silver and Terzian [25] or (indepen- 
dently) Stein [26] in the early 1960s. However, it seems to go back much further. Knuth [20, 
§4.5.2] quotes a translation of a first-century AD Chinese text Chiu Chang Suan Shu on how to 
reduce a fraction to lowest terms: 

If halving is possible, take half. 

Otherwise write down the denominator and the numerator, 
and subtract the smaller from the greater. 

Repeat until both numbers are equal. 

Simplify with this common value. 

This is essentially Algorithm B. Hence, the binary algorithm is almost as old as the classical 
Euclidean algorithm [11]. 



2 



2.3 The Worst Case 



Although this paper is mainly concerned with the average case behaviour of the binary Euclidean 

algorithm, we mention the worst case briefly. At step Bl, u and v are odd, so t is even. Thus, 
step B2 always reduces t by at least a factor of two. Using this fact, it is easy to show that 
lg(n + v) decreases by at least one each time step B3 is executed, so this occurs at most 

l\g{u + v)\ 

times [20, exercise 4.5.2.37]. Thus, if N = max(n, step B3 is executed at most 

Ig(iV) + 0(1) 

times. 

Even if step B2 is replaced by single-bit shifts 

while even(i) do i i/2 

the overall worst case time is still O(logiV). In fact, it is easy to see that {lg{u) +\g{v)) decreases 
by at least one for each right shift, so the number of right shifts is at most 21g(Ar). 

2.4 The Extended Binary Algorithm 

It is possible to give an extended binary GCD algorithm which computes integer multipliers a 
and P such that 

au + Pv = GCT){u,v) . 

Let n = \\gu] + [Igv] be the number of bits in the input. Purdy [22] gave an algorithm with 
average running time 0(n) but worst case of order n^. This was improved by Bojanczyk and 
Brent [2], whose algorithm has worst case running time 0{n). 

Let g = GCD{u,v), u' = u/g, v' = v/g. In [2, §4] an algorithm is given for reducing the 
fraction u/v to u' /v' without performing any divisions (except by powers of two). 

2.5 Parallel Variants and the Class NC 

There is a systolic array variant of the binary GCD algorithm (Brent and Kung [7] ) . This takes 
time O(logiV) using 0(log A/") 1-bit processors. The overall bit-complexity is 0{{\ogN)'^). 

For n-bit numbers the systolic algorithm gives time 0(n) using 0{n) processors. This is 
close to the best known parallel time bound (Borodin et al [3]). 

It is not known if GCD is in the class NC.^ This is an interesting open problem because the 
basic arithmetic operations of addition, multiplication, and division are in NC (see Cook [8, 9]). 
Thus, the operation of computing CCDs is perhaps the simplest arithmetic operation which is 
not known to be in NC. 

It is conceivable that testing coprimality, i.e. answering the question of whether 
GCD(?x, u) = 1, is "easier" than computing GCD(u, f) in general. There is evidence that testing 
coprimality is in NC (Litow [21]). 

^NC is the class of problems which can be solved in parallel in time bounded by a polynomial in log L, where 

L is the length of the input, using a number of processors bounded by a polynomial in L. Note that for the GCD 
problem L = Q{n) = QilogN), so we are asking for a time polynomial in log log A'^, not in logN. 
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3 A Heuristic Continuous Model 



To analyse the expected behaviour of Algorithm B, we can follow what Gauss [15] did for the 
classical algorithm. This was first attempted in [4]. There is a summary in Knuth [20, §4.5.2]. 

Assume that the initial inputs uq, vq to Algorithm B are uniformly and independently dis- 
tributed in (0,A^), apart from the restriction that they are odd. Let {un,Vn) be the value of 
{u, v) after n iterations of step B3. 

Let 

_ m.m{Un,Vn) 

max(u„,i;„} 

and let Fn{x) be the probability distribution function of Xn (in the limit as N ^ oo). Thus 
Fo{x) = X ioi X e [0, 1]. 



3.1 A Plausible Assumption 

We make the assumption'^ that Val2(t) takes the value k with probability 2"^^ at step B2. The 
assumption is plausible because Val2(t) at step B2 depends on the least significant bits of u and 
V, whereas the comparison at step B3 depends on the most significant bits, so one would expect 
the steps to be (almost) independent when N is large. In fact, this independence is exploited 
in the systolic algorithms [2, 6, 7] where processing elements perform operations on the least 
significant bits without waiting for information about the most significant bits. 



3.2 The Recurrence for F„ 

Consider the effect of steps B2 and B3. We can assume that initially u > v, so t = u — v. If 
Val2(t) = k then X = v/u is transformed to 



X' = min 
It follows that X' <x iS 



'u-v 2''v \ . fl-X 2^X 



mm 



2^v ' u-v \ 2^X ' 1 - X 



X < or X > ^ 



l + 2'=/a; 1 + 2^=3 

Thus, the recurrence for Fn{x) is 



F„„(.) = 1 + Z 2-' (f„ (^) - F„ (^)) (1) 



with initial condition Fq{x) = x for x G [0, 1]. 
It is convenient to define 



Fn{x) = 1 - Fn{x) . 



The recurrence for Fn{x) is 

f„«w = i;2-'(f„(^)-f„(^ 



2k 



X 



(2) 



k>l 

and Fo{x) = 1 — a; for x G [0, 1]. 

^Vallcc does not make this assumption. Her results are mentioned in §§5-6. They show that the assumption 
is correct in the hmit as iV — >■ oo. 
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3.3 The Recurrence for 

Differentiating the recurrence (1) for F„ we obtain (formally) a recurrence for the probability 
density = F^{x): 

]^ \ 2 / ^ X / 1 \ 2 



It was noted in [4, §5] that the coefHcients in this recurrence are positive, and that the recurrence 
preserves the Li norm of nonnegative functions (this is to be expected, since the recurrence maps 
one probability density to another). 

3.4 Operator Notation 

The recurrence for /„ may be written as 

fn+i = ^2/n, 

where the operator B2 is the case s = 2 of a more general operator Bg which is defined in (14) 
of §5.4. 

4 Conjectured and Empirical Results 

In the 1976 paper [4] we gave numerical and analytic evidence (but no proof) that Fn{x) con- 
verges to a limiting distribution F{x) as n — )■ 00, and that fn{x) converges to the corresponding 
probability density f{x) = F'{x) (note that / = so / is a "fixed point" of the operator B2)- 

Assuming the existence of F, it is shown in [4] that the expected number of iterations of 
Algorithm Bis^iflgATasiV— >-oo, where K = 0.705 ... is a constant given by 

K = \n2/E^, (4) 

and^ 

/ 1-2-'^ \ 1 

^A;=2 



£;oo = ln2 + [\y( — ^ \ I - , ^ J F(x) dx . (5) 
io l£lll + (2'=-l)a;i 2(l + x)i ^> ^' 



4.1 A Simplification 

We can simplify the expressions (4)-(5) for K to obtain 

K = 2/h, (6) 

where ^ 

6 = 2-/ lg(l - x)f{x) dx . (7) 
Jo 

Using integration by parts we obtain an equivalent expression 

-■1 1 - F{x) 



6 = 2 + 



1 /■! 1 - F(x) , 

- — / ^—^ dx . (8 

n2 Jo 1 - X ^ ^ 



For my direct proof of (7)-(8), see Knuth [20, §4.5.2]. The idea is to consider the expected change 
in lg(uv) with each iteration of Algorithm B (to obtain the equivalent but more complicated 
expression (5) we considered the expected change in lg(u + v)). 



We have corrected a typo in [4, eqn. (6.3)]. 
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5 Another Formulation — Algorithm V 

It will be useful to rewrite Algorithm B in the following equivalent form (using pseudo-Pascal): 

Algorithm V { Assume u < v } 

while u ^ V do 
begin 

while u < V do 
begin 

j ^ Va.h{v - u); 

V -h- {v - u)/2^; 

end; 
u v; 
end; 
return u. 

5.1 Continued Fractions 

Vallee [30] shows a connection between Algorithm V and continued fractions of a certain form 

u 1 



ai + 



a2 + 



+ 



which by convention we write as 

- = 1/ai + 2^=702 + 2^=2/ . . . /(a, + 2^^) . (9) 

V 

Here aj is odd, kj > 0, and < aj < 2^^ (excluding the trivial case u = v = 1). 
5.2 Some Details of Vallee's Results 

Algorithm V has two nested loops. The outer loop exchanges u and v. Between two exchanges, 
the inner loop performs a sequence of subtractions and shifts which can be written as 

V u + 2^''vi; 

Vl ^ U + 2^^V2\ 
Vm-l U + 2^"'Vm 

with Vm < u. 

If xo = u/v at the beginning of an inner loop, the effect of the inner loop followed by an 
exchange is the rational x\ = Vm/u defined by 

1 

Xq 



a + 2^X1 ' 
where a is an odd integer given by 

a = 1 + 2''i + 2''i+''2 _^ 1_ 26i+-+6m-i ^ 
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and the exponent k is given by 

k = bi + --- + b, 



Thus, the rational u/v, for 1 < n < v, has a unique binary continued fraction expansion of the 
form (9). VaUee studies three parameters related to this continued fraction: 

1. The height or the depth (i.e. the number of exchanges) r. 

2. The total number of operations necessary to obtain the expansion (equivalently, the num- 
ber of times step B2 of Algorithm B is performed): if p{a) denotes the number of "l"s in 
the binary expansion of the integer a, it is equal to p{ai) + p{a2) + • • • + p{ar). 

3. The total number of one-bit shifts, i.e. the sum of exponents of 2 in the numerators of the 
binary continued fraction, ki + ■ ■ ■ + kr- 



5.3 Vallee's Theorems 

Vallee's main results give the average values of the three parameters above: the average values 
are asymptotically Ai In N for certain computable constants Ai,A2, A3 related to the spectral 
properties of an operator V2 which is defined in (11) of §5.4. Clearly the constant K of §4 is 
A2 In 2. 



5.4 Some Useful Operators 

Operators Bg, Vg, Ug, Us-, useful in the analysis of the binary Euclidean algorithm, are defined 

by 

^s[m = T.{{z:h^\ f{z:h^) + { > (10) 



fe>i 



a + 2^xj \a + 2^ 



Vs[f\{x) = Y. E (:7tW)' /(tttW) ' (11) 



i;>l a odd, 
0<a<2*: 



".i/iw = g(T^)'/(r^). (12) 

M.I/lW = frVw.[/lfrl ■ (13) 



In these definitions s is a complex variable, and the operators are called Ruelle operators [24]. 
They are linear operators acting on certain function spaces. It is immediate from the definitions 
that 

Bs=Us+Us, (14) 

The case s = 2 is of particular interest. B2 encodes the effect of one iteration of the inner "while" 
loop of Algorithm V, and V2 encodes the effect of one iteration of the outer "while" loop. See 
Vallee [29, 30] for further details. 



5.5 History and Notation 

B2 (denoted T) was introduced in [4], and was generalised to Bg by Vallee. Vg was introduced 
by Vallee [29, 30]. We shall call 

• Bg (or sometimes just B2) the binary Euclidean operator and 

• Vs (or sometimes just V2) Vallee's operator. 
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5.6 Relation Between the Operators 

The binary Euclidean operator and Vallee's operator are closely related, as Lemma 1 and The- 
orem 1 show. 

Lemma 1 
Proof. From (11), 



k>l a odd, 
0<a<2'= 



but, from (12) and (13), 

On substituting y = l/{a + 2*^x) we obtain 

VsiUsimx) = E E E {i + 2^a + 2k+mx) ^ (l + 2"^a + 2'=+"^x) " 

fe>l a odd, rn>l 
0<a<2fc 

Thus, to show that 

Vs[f]{x) =Us[f]{x) +VsUs[f]{x) 
it suffices to observe that the set of polynomials 

{a' + 2^'x I A;' > 1, a' odd, < a' < 2*='} 

is the disjoint union of the two sets 

{l + 2^x I A > 1} 

and 

{1 + 2™a + 2'=+™x I A: > 1, m > 1, a odd, < a < 2^=}. 

To see this, consider the two cases a' = 1 and a' > 1. If 2*^' > a' > 1 we can write a' = 1 + 2'^a, 
k' = k + m, for some (unique) odd a and positive A, m. □ 

5.7 Algorithmic Interpretation 

Algorithm V gives an interpretation of Lemma 1 in the case s = 2. If the input density of 

X = u/v is f{x) then execution of the inner "while" loop followed by the exchange of u and 
V transforms this density to V2[f]{x). However, by considering the first iteration of this loop 
(followed by the exchange if the loop terminates) we see that the transformed density is given 

by 

V2U2[fKx)+U2[fKx), 

where the first term arises if there is no exchange, and the second arises if an exchange occurs. 
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5.8 Consequence of Lemma 1 

The following Theorem gives a simple relationship between Bg, Vg and Ug- 
Theorem 1 

{Vs-I)Us = VsiBs-I) . 
Proof. This is immediate from Lemma 1 and the definitions of the operators. □ 

5.9 Fixed Points 

It follows immediately from Theorem 1 that, if 

9 = U2f, (15) 

then 

(V2-X)5 = V2(i32-X)/. 

Thus, if / is a fixed point of the operator B2, then 51 = is a fixed point of the operator 
V2- (We can not assert the converse without knowing something about the mih space of V2-) 
Prom a result of Vallee [30, Prop. 4] we know that V2, acting on a certain Hardy space 
has a unique positive dominant simple eigenvalue I, so g must be (a constant multiple of) the 
corresponding eigenfunction (provided g G T-L^iV)). 

Lemma 2 If f is a fixed point of B2 and g is given by (15), then 

/(l) = 2,(l)=2g(^)'/(^). 

Proof. This is immediate from the definitions of B2 and W2. □ 



6 A Result of Vallee 

Using her operator V^, Vallee [30] proved that 

K = 



21n2 (16) 



a>0 



where g is a nonzero fixed point of V2 (i.e. g = V2g 7^ 0) and G{x) = Jq g{t) dt . This is the 
only expression for K which has been proved rigorously. 

Because Vg has nice spectral properties, the existence and uniqueness (up to scaling) of g 
can be established. 



6.1 A Conjecture of Vallee 

Let 

A = /(I) , (17) 

where / is the limiting probability density (conjectured to exist) as in §4. A and K are fun- 
damental constants which are not known to have simple closed form expressions - to evaluate 
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them numerically we seem to have to approximate a probability density f{x) (or g{x)) or the 
corresponding distribution F{x) (or G{x)). Vallee (see Knuth [20, §4.5.2(61)]) conjectured that 

A 2In2 



b 7r2 ' 

where b is given by (7) or (8). Equivalently, from (6), her conjecture is that 

KX='-^. (18) 

Vallee proved the conjecture under the assumption that the operator Bs satisfies a "spectral 
gap" condition which has not been proved, but which is plausible because it is known to be 
satisfied by V^. Specifically, a sufficient condition is that the operator, acting on a suitable 
function space, has a simple positive dominant eigenvalue Ai , and there is a positive e such that 
all other eigenvalues Xj satisfy |Aj| < Ai — e. 



7 Some Relations Between Fixed Points 

In this section we assume that / is a fixed point of the operator B2, g = W2/ as in §5.9 is a 
fixed point of the operator V2, and both / and g are analytic functions (not necessarily regular 
at X = 0). Using analyticity we extend the domains oi f, g etc to include the positive real axis 
(0,+(X)). Let 

F{x)= r f{t)dt 
Jo 

and 

G{x)= r g{t)dt 
Jo 

be the corresponding integrals. By scaling, we can assume that 

F(l) = 1 

but, in view of (15), we are not free to scale g. (See (20) below.) 
Prom the definition (12) of Us and (15), we have 

1X2 



k=l 

so, integrating with respect to x, 



0W=|2-(F(l)-f(^ 



This simplifies to 

Although our derivation of (19) assumes x G [0, 1], we can use (19) to give an analytic continu- 
ation of G{x). Allowing x to approach +00, we see that there exists 

lim Gix) = G(+oo) 

say, and 

G{+oo) = 1 . (20) 
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We can use the functional equation (2) to extend the domain of definition of F{x) to the 
nonnegative real axis [0, +00). It is convenient to work with -F(x) = 1 — F(^x) rather than with 
F(x) because of the following result. 

Lemma 3 

F{x) = G{l/x) - G{x) 

and consequently 

F{l/x) = -F{x) . 

Proof. This is immediate from (2) and (19). □ 
Lemma 4 



k=l a odd, 

0<a<2fc 



a J Va + 2'^x 



Proof. Since g{x) is a fixed point of V2, we have 

g{x) = V2[gKx) 



k=l a odd, 
0<a<2fc 



a + 2^xj \a + 2k 



"-x 



Integrating with respect to x and making the change of variable u = l/(a + 2 x) gives 

G{x) = Y,2-^ Y: / ^ giu)du, 

t^l a odd, JlKa+2^x) 

0<a<2fc 



and the result follows. □ 
Lemma 5 

£2- E G(i)=l. 



k=\ a odd, 

0<a<2fc 



Proof. Let x +00 in Lemma 4 and use (20). □ 

The sum occurring in the following Lemma is the same as the sum in (16). To avoid confusion 
we repeat that our normalisation of G is different from Vallee's, but note that the right side of 
(16) is independent of the normalisation of G because ^(1) appears in the denominator. 

Lemma 6 



^ 2-Lig«JGQ] =1. 



a odd, 
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Proof. We can write Lemma 5 as 



a odd 
a>0 



where 

Ca= 2"'^ = 2- Lis "J 



fe>i, 

2'=>a 



Thus, the sums occurring in Lemma 5 and Lemma 6 are identical. □ 

Theorem 2 Under the assumptions stated at the beginning of this section, the expressions (16) 
and (18) are equivalent. 



Proof. This follows immediately from Lemmas 2 and 6. □ 

Remarks. As noted above, Vallec proved (18) under an assumption about the spectrum of Bg- 
Our proof of Theorem 2 is more direct. We are not able to prove the equivalence of (6) and (18), 
but (as described in §8) it has been verified numerically to high precision. 



8 Numerical Results 

Using an improvement of the "discretization method" of [4] , with Richardson extrapolation (see 
§8.1) and the equivalent of more than 50 decimal places (50D) working precision, we computed 
the hmiting probability distribution F, then K (using (6) and (8)), A = /(I), and K\. The 
results were 

K = 0.7059712461 0191639152 9314135852 8817666677 
A = 0.3979226811 8831664407 6707161142 6549823098 
KX = 0.2809219710 9073150563 5754397987 9880385315 

These are believed to be correctly rounded values. 

The computed value of KX agrees with 41n2/7r^ to 40 decimals^, in complete agreement 
with Vallee's conjecture (18). 

8.1 Some Details of the Numerical Computation 

A consequence of Lemma 3 is that F(e~^) is an odd function of y. This fact was exploited in 
the numerical computations. By discretising with uniform stepsize h in the variable y we can 
obtain K with error 0(/i^''+^) after r Richardson/Romberg [16, 18] extrapolations, because the 
error has an asymptotic expansion containing only even powers of h. 

In fact, we found it better to take a uniform stepsize h in the variable z = ^Jy., i.e. make the 
change of variables 

X = exp(— z^) 

because this puts more points near x = and less points in the "tail" . We truncated at a point 

Zmax sufficiently large that cxp(— z^^^) was negligible. 

To obtain 40D results it was sufficient to take Zmax = 11, h = z„iaxl'^^ , r = 7, iterate 
the recurrence for F„ 81 times (using interpolation by polynomials of degree 2r + 1 where 
necessary) to obtain F ^ Fsi to 0{h^^) accuracy. Using the trapezoidal rule, we obtained K 

®In fact the agreement is to 44 decimals. 
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by numerical quadrature to 0{h?) accuracy, and then applied seven Richardson extrapolations 
(using the results for stepsize h,2h,2^h, ■ ■ ■ to obtain K with error 0{h^^). Similarly, we 

approximated A = F'(l) by 

Af^F(exp(-/i2))//i2 

and then used extrapolation. Only three Richardson extrapolations were needed to obtain A 
with error 0{h^^) because the relevant asymptotic expansion includes only powers of h^. 

8.2 Subdominant eigenvalues 

In order to estimate the speed of convergence of /„ to / (assuming / exists), we need more 
information on the spectrum of 82- What can be proved ? 

Preliminary numerical results indicate that the sub-dominant eigenvalue(s) are a complex 
conjugate pair: 

A2 = A3 = 0.1735 ± 0.0884Z , 

with IA2I = IA3I = 0.1948 to 4D. 

The appearance of a complex conjugate pair is interesting because in the classical case it 
is known that the eigenvalues are all real, and conjectured that (when ordered in decreasing 
absolute value) they alternate in sign [13]. 

8.3 Complexity of approximating K 

We have several expressions for K which are conjectured to be equivalent. Which is best for 
numerical computation ol K 7 Suppose we want to estimate K to n-bit accuracy, i.e. with error 
0(2-"). 

We could iterate the recurrence 

gk+i{x) = V2[gk]{x) 

to obtain the principal fixed point g{x) of Vallee's operator V2- However, the sum over odd a in 
the definition of V2 appears to require the summation of exponentially many terms. Similarly, 
the sum in (16) appears to require exponentially many terms (unless we can assume that g is 
scaled so that Lemma 6 applies). 

Thus, it seems more efficient numerically to approximate the principal fixed point f{x) of 
the binary Euclidean operator 82 or the corresponding integral F{x) (or F(x) = 1 — F{x)), even 
though the existence of / or F has not been proved. 

It seems likely that f{x) is unbounded in a neighbourhood of x = 0, so it is easier numerically 
to work with F{x). In the sum (2), the terms arc bounded by 2"^, so we need take only 0(n) 
terms to get n-bit accuracy. (If we used the recurrence (3) it would not be so clear how many 
terms were required.) 

Assuming that B2 has a positive dominant simple eigenvalue 1 (as seems very likely) , conver- 
gence of F^ix) to F{x) is linear, so 0(n) iterations are required. We have to tabulate F{x) at a 
sufficiently dense set of points that the value at any point can be obtained to sufficient accuracy 
by interpolation. If the scheme of §8.1 is used, it may be sufficient to take h = 0{l/n) and use 
polynomial interpolation of degree 0(n/ log n). (Here [20, ex. 4.5.2.25] may be relevant.) 

The final step, of estimating the integral (8) and A = /(I), can be done as in §8.1 with a 
relatively small amount of work. Alternatively, we can avoid the computation of an integral 
by using (18). However, the independent computation of K and A provides a good check on 
the numerical results, since it is unlikely that any errors in the computed values of K and/or A 
would be correlated in such a way as to leave the product KX unchanged. 

Overall, the work required to obtain an n-bit approximation to K appears to be bounded 
by a low-degree polynomial in n. Probably O(n^) bit operations are sufficient. It would be 
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interesting to know if significantly faster algorithms exist. For example, is it possible to avoid 
the computation of F{x) or a similar function at a large number of points ? 



9 Correcting an Error 



In [4] it was claimed that, for all n > and x G (0, 1], 



Fn{x) = an{x) lg(x) + /3n(x) 



(21) 



where an{x) and (3n{x) are analytic and regular in the disk \x\ < 1. However, this is incorrect, 
even in the case n = 1. 

The error appeared to go unnoticed until 1997, when Knuth was revising Volume 2 in prepa- 
ration for publication of the third edition. Knuth computed the constant K using recurrences for 
the analytic functions an{x) and I3n{x), and I computed K directly using the defining integral 
and recurrences for -F„(x). Our computations disagreed in the 14th decimal place ! Knuth found 



We soon discovered the source of the error. It was found independently, and at the same 
time, by Flajolet and Vallee. 

The source of the error is illustrated by [4, Lemma 3.1], which is incorrect, and corrected in 
[20, solution to ex. 4.5.2.29]. In order to explain the error, we need to consider Mellin transforms 
(a very useful tool in average-case analysis [12]). 

9.1 Mellin Transforms and Mellin Inversion 

The Mellin transform of a function^ g{x) is defined by 



K = 0.70597 12461 01945 99986 • • • 



but I found 



K = 0.70597 12461 01916 39152 • • • 




It is easy to see that, if 



x) , 



k>l 



then the Mellin transform of / is 



ris) 



2S+1 _ 1 • 



k>l 



Under suitable conditions we can apply the Mellin inversion formula to obtain 




Applying these results to 



g{x) = l/{l + x) 



6i 



The functions / and g here are not necessarily related to those occurring in other sections. 
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whose Mellin transform is 

g*{s) = tt/ siriTTS when < TZs < 1 , 

we find 



«>i 



as a sum of residues of 

TT 



siUTTS 7 2*+^ — 1 



for TZs < 0. This gives 



(23) 



X 2 4 

f{x) = l + xlgx-{ \-xP{lgx) + -x^ , (24) 

2 13 

where 

_ sin2n7rt 
^ ' ^h[2^^ sinh(2n7r2/ln2) ' ^ ^ 

9.2 The "Wobbles" Caused by P{t) 
P{t) is a very small periodic function: 

\P{t)\ < 7.8 X 10-^^ 

for real t. In [4, Lemma 3.1], the term xP{\gx) in (24) was omitted. Essentially, the poles of 
(23) off the real axis at 

s = -l±— — , n = l,2,... 
In2 

were ignored.^ 

Because of the sinh term in the denominator of (25), the residues at the non-real poles are 
tiny, and numerical computations performed using single-precision floating-point arithmetic did 
not reveal the error. 



9.3 Details of Corrections 

The function j{x) of (22) is called D\{x) in [4]. In (3.29) of [4, Lemma 3.1], the expression for 
D\{x) is missing the term xP(lgx). 

Equation (3.8) of [4] is (correctly) 

Fn{x) = l + i^„(l/x) -Z5„(x) 
so in Corollary 3.2 the expression for F\{x) is missing a term — xP(lgx). 

The statement following Corollary 3.2 of [4], that "In principle we could obtain F2(x), F3(x), 
etc in the same way as -Fi(x)" is dubious because it is not clear how to handle the terms involving 
P(lgx). 

To quote Gauss [notebook, 1800] , who was referring to (x) etc for the classical algorithm: 
Tam complicatcB evadunt, ut nulla spes superesse videatur.^ 

Corollary 3.3 of [4], that -F„+i 7^ F„, is probably correct, but the proof given is incorrect 
because it assumes the incorrect form (21) for Fn{x). 

^In fact, the incorrect result was obtained without using Melhn transforms. If I had used them I probably 
would have obtained the correct result! 

*They come out so complicated that no hope appears to be left. 
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9.4 An Analogy 

Ramanujan made a similar error when he gave a formula for 7r(x) (the number of primes < x) 

which essentially ignored the residues of C,' [s) / C,{s) arising from zeros of C,{s) off the real axis. 
For further details we refer to Berndt [1], Hardy [17], Riesel [23, Ch. 1-3] and the references 
given there. 

10 Conclusion and Open Problems 

Since Vallee's recent work [29, 30], analysis of the average behaviour of the binary Euclidean 
algorithm has a rigorous foundation. However, some interesting open questions remain. 

For example, does the binary Euclidean operator B2 have a unique positive dominant simple 
eigenvalue 1? Vallee [30, Prop. 4] has proved the corresponding result for her operator V2. Are 
the various expressions for K given above all provably correct ? (Only (16) has been proved.) 
Is there an algorithm for the numerical computation of K which is asymptotically faster than 
the one described in §8.1 ? How can we give rigorous error bounds on numerical approximations 
toKl 

In order to estimate the speed of convergence of fn to / (assuming / exists), we need more 
information on the spectrum of H2. What can be proved ? As mentioned in §8.2, numerical 
results indicate that the sub-dominant eigenvalue(s) are a complex conjugate pair with absolute 
value about 0.1948. 

It would be interesting to compute the spectra of B2 and V2 numerically, and compare with 
the classical case, where the spectrum is real and the eigenvalues appear to alternate in sign. 

In order to give rigorous numerical bounds on the spectra of B2 and wc need to bound 
the error caused by making finite-dimensional approximations to these operators. This may be 
easier for V2 than for B2- 
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